A Study on Out-of-vocabulary Word Modeling for a Segment-based Keyword Spotting System

نویسندگان

  • Alexandros S. Manos
  • Victor W. Zue
چکیده

The purpose of a word spotting system is to detect a certain set of keywords in continuous speech. The most common approach consists of models of the keywords augmented with \ ller," or \garbage" models, that are trained to account for non-keyword speech and background noise. Another approach is to use a large vocabulary continuous speech recognition system (LVCSR) to produce the most likely hypothesis string, and then search for the keywords in that string. The latter approach yields much higher performance, but is signi cantly more costly in computation and the amount of training data required. In this study, we develop a number of segment-based word spotting systems in an e ort to achieve performance comparable to the LVCSR spotter, but with only a small fraction of the vocabulary. We investigate a number of methods to model the keywords and background, ranging from a few coarse general models to re ned phone representations. The task is to detect sixty-one keywords from continuous speech in the ATIS corpus. We have achieved performance of 89.8% Figure of Merit (FOM) for the LVCSR spotter, 81.8% using phonewords as ller models, and 79.2% using eighteen more general models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Cross-word sub-word units for low-resource keyword spotting

We investigate the use of sub-word lexical units for the detection of out-of-vocabulary (OOV) keywords in the keyword spotting task. Sub-word units based on morphological decomposition and character ngrams are compared. In particular, we examine the benefit of sub-word units that cross word boundaries. Experiments are performed on the IARPA Babel Turkish dataset. Our results demonstrate that cr...

متن کامل

A new approach for modeling OOV words

This paper addressed the problem of Out-Of-Vocabulary (OOV) utterance detection in small vocabulary telephone keyword spotting system. We propose a new approach for modeling OOV words in the scenario of a small vocabulary of telephone keyword spotting system. The paper adopt the semi-continuous Hidden Markov Model with multiple codebooks to modeling the keywords. We propose a two pass procedure...

متن کامل

Out-of-Vocabulary Word Modeling and Rejection for Spanish Keyword Spotting Systems

This paper presents a combination of out-of-vocabulary (OOV) word modeling and rejection techniques in an attempt to accept utterances embedding a keyword and reject utterances with nonkeywords. The goal of this research is to develop a robust, task-independent Spanish keyword spotter and to develop a method for optimizing confidence thresholds for a particular context. To model OOV words, we e...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996